release v0.90.0 #1421

dcshzj · 2024-06-27T07:46:40Z

New

feat(monitoring): add scheduler functionality #1383
feat(monitoring): add dns reporter #1376
chore: update slack-bolt #1415
fix: package.json & package-lock.json to reduce vulnerabilities #1414
backport v0.89.0 #1412

Dependencies

chore(deps): minor upgrade aws #1416
chore(deps): bump braces from 3.0.2 to 3.0.3 #1413

Tests

feat(monitoring): add scheduler functionality `#1383`

on deployment, assert that you see these logs. it is ok for there to be multiple instances of this log (it directly corresponds to the number of instances that we have) since bullmq is smart enough to only create one queue, and one repeatable job over multiple instances.

feat(monitoring): add dns reporter `#1376`

in server.ts add:
monitoringService.driver()

should see this in the logs:

Full Changelog: https://github.com/isomerpages/isomercms-backend/compare/v0.89.0..v0.90.0

backport v0.89.0

The following vulnerabilities are fixed with an upgrade: - https://snyk.io/vuln/SNYK-JS-WS-7266574 Co-authored-by: snyk-bot <[email protected]>

Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. - [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md) - [Commits](micromatch/braces@3.0.2...3.0.3) --- updated-dependencies: - dependency-name: braces dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: Alexander Lee <[email protected]>

## Problem This is a first pr that is up to add some level of sane reporting. While scheduling is part of this feature, it is not within the scope of this pr. This pr only adds (currently dead code) logic to grab the domains that we own in isomer, and do a dns dig. This is meant to be verbose, and in the future alarms can be added based on the results of this. This is not meant to replace monitoring, it is just meant to fine tune some blind spots that uptime robot currently has + some sane checker during incident response to show history of dns records for a site that we manage. I am opting to log it directly in our backend to keep things simple. will add alarms + the scheduler in subsequent prs. ## Solution grab ALL domains from keycdn + amplify + redirection records + log dns records on them. **Breaking Changes**  - [ ] Yes - this PR contains breaking changes - Details ... - [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5) ## Tests  in server.ts add: `monitoringService.driver()` should see this in the logs: ![Screenshot 2024-05-15 at 5.48.05 PM.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/4JosFH65rhzwIvkZw2J6/2bf61e7f-0ec4-466f-87b7-ec7e1d84993e.png) ## Deploy Notes   **New environment variables**: - `KEYCDN_API_KEY` : to get all the zones that we own in keycdn - `S3_BUCKET_NAME`: bucket name - [ ] HAVE NOT added env var to 1PW + SSM script (`fetch_ssm_parameters.sh`) **New scripts**: - `script` : script details **New dependencies**: - `dependency` : dependency details **New dev dependencies**: - `dependency` : dependency details

## Problem This is the second part of the monitoring feature that we want to build. This PR only cares about adding a scheduler + the related infra needed for this to function. this will make the monitor run once every 5 mins, for oncalls to pick any related alarms from this. Adding the alarms is done in the downstream PR . ## Solution Using bullmq to conveniently create a queue, a worker and a repeatable job over multiple instances. We do some level of exponential backoff retries since it is a nice to have and easy to implement. The original `/site-up` code has since been refactored to return an `err` or a `ok`, depending on whether the configuration is ideal. Unfortunately, this caused quite a number of edge cases to pop up. Due to the nature of this, a more loose check of whether the isomer logo is present is being used to determine if a site is up. Even with this loose check, we have a `workplacelearning.gov.sg` who have modified their site to not have the Isomer logo. Have used gb to code white list this weird site. Potentially, if tomorrow we have an alarm of a site going down, but this is expected to prolong, we can go to growthbook and change the config for this to be whitelisted. **Breaking Changes**  - [ ] Yes - this PR contains breaking changes - Details ... - [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5) ## Tests <img width="951" alt="Screenshot 2024-05-21 at 11 20 59 AM" src="https://github.com/isomerpages/isomercms-backend/assets/42832651/2a79df20-75c5-4c47-8d69-f030ca64cf3d"> on deployment, assert that you see these logs. it is ok for there to be multiple instances of this log (it directly corresponds to the number of instances that we have) since bullmq is smart enough to only create one queue, and one repeatable job over multiple instances.  ## Deploy Notes corresponding infra pr should be deployed to production and only then should the redis host value be populated into the 1pw for production. Additionally, post approval of this pr, add two alarms, one for `Error running monitoring service` and another for `Monitoring service has failed`. These are errors when the job has failed to be initalised, and when there is a new error. **New environment variables**: - `REDIS_HOST` : Redis host - [ ] added env var to 1PW + SSM script (`fetch_ssm_parameters.sh`) **New dependencies**: - `bullmq` : scheduler of choice

release v0.90.0

alexanderleegs and others added 8 commits June 13, 2024 14:57

Merge pull request #1412 from isomerpages/release_v0.89.0

1ad9b85

backport v0.89.0

fix: package.json & package-lock.json to reduce vulnerabilities (#1414)

52d81bc

The following vulnerabilities are fixed with an upgrade: - https://snyk.io/vuln/SNYK-JS-WS-7266574 Co-authored-by: snyk-bot <[email protected]>

chore: update slack-bolt (#1415)

e2a0858

Co-authored-by: Alexander Lee <[email protected]>

chore(deps): minor upgrade aws (#1416)

e13ac7c

chore: bump version to v0.90.0

10d7f32

dcshzj mentioned this pull request Jun 27, 2024

backport v0.90.0 #1422

Merged

kishore03109 self-requested a review June 27, 2024 08:13

kishore03109 approved these changes Jun 27, 2024

View reviewed changes

kishore03109 added this pull request to the merge queue Jun 27, 2024

Merged via the queue into master with commit d547f28 Jun 27, 2024
43 of 44 checks passed

kishore03109 deleted the release_v0.90.0 branch June 27, 2024 08:23

seaerchin pushed a commit that referenced this pull request Sep 24, 2024

Merge pull request #1421 from isomerpages/release_v0.90.0

91a7f0b

release v0.90.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release v0.90.0 #1421

release v0.90.0 #1421

dcshzj commented Jun 27, 2024 •

edited

Loading

release v0.90.0 #1421

release v0.90.0 #1421

Conversation

dcshzj commented Jun 27, 2024 • edited Loading

New

Dependencies

Tests

feat(monitoring): add scheduler functionality #1383

feat(monitoring): add dns reporter #1376

dcshzj commented Jun 27, 2024 •

edited

Loading

feat(monitoring): add scheduler functionality `#1383`

feat(monitoring): add dns reporter `#1376`